2020-07-10

Multithreading with global state

First time I wrote multi-threaded code in Jai was when I was building Dark_Matter. Which is a compiler for my custom documentation format with the same name. Initial version of the parser module had been written without any consideration for multi-threading and the solution to that blunder was surprisingly simple, so much so that I've decided to write about it here.

One core

When I was building the first version of the Dark_Matter compiler I used a lot of global variables during parsing. Here's what the first version of the parser state looked like:

#scope_file // Internal state:

merge_code_lines     := false;
last_empty           := false;
collect_empty        := false;
has_metadata_block   := false;
last_indentation     := 0;
current_indentation  := 0;
source_line          := 0;
read_time            := 0;

source_file : string;
document    : *Document;
last_node   : *Node;
node_pool   : Pool(Node, 128);
text_pool   : Pool(Text, 256);

#scope_file just means these variables cannot be accessed outside of this file. If this file is loaded into another, that file will not be able to access any of these variables. This state is global to the file it resides in.

This version of Dark_Matter only ran on the main thread, so this way of storing the parser state was completely reasonable. However, soon after finishing the first version I realized that generating a website recursively meant parsing many different files, and this parsing can be done completely independently.

It's a very obvious case for multi-threading. So I added a thread pool and a job system, where each thread would parse 1 file at the time. I ran it, and it crashed, of course it did! Parser state is shared between threads, it's global! All threads in the pool are reading and writing the same shared global data!

Many cores

How do we fix this mess, a big rewrite right? No, it only takes 3 extra lines of code. Jai standardizes the context, which is essentially state which is not shared between threads. This state contains information on which allocator should be used, thread id, etc.

We can add fields to the context during compile time by using #add_context, like so:

#add_context meaning_of_life: int = 42;

// Use context:
do_something :: () {
    print("Meaning of life is: %", context.meaning_of_life);
    context.meaning_of_life = 84;
}

Knowing that, we can wrap the parser state into a struct, and then add a field of that type to the context. This way every thread will have it's own version of the parser state in context :

#scope_file // Internal state:

// Wrap all the global variables into a struct:
Parse_Context :: struct {
    merge_code_lines      := false;
    last_empty            := false;
    collect_empty         := false;
    has_metadata_block    := false;
    last_indentation      := 0;
    current_indentation   := 0;
    source_line           := 0;
    read_time             := 0;

    source_file : string;
    document    : *Document;
    last_node   : *Node;
    node_pool   : Pool(Node, 128);
    text_pool   : Pool(Text, 512);
}

#add_context parse_context: Parse_Context;

Great, now every thread has it's own Parse_Context which contains the parser state. However, we now have to find every usage of this state and change it so that it references the context.

For example:

// This
last_empty       = false;
last_indentation = current_indentation;

// Has to be changed to this:
context.parse_context.last_empty       = false;
context.parse_context.last_indentation = current_indentation;

This is fine, but it's a lot of work. Luckily there's one more thing we can do. Jai has another useful keyword - using, which essentially brings all members from the given scope into the current scope. So if we add the following line to our solution:

using context.parse_context;

We do not have to change anything else. All members of our parser_context can now be treated as if they were defined in the current scope. Meaning we do not need to edit anything else in the parser module. 3 lines of code (4 if we include the brace), is all it took to make the original parser state thread safe!

Let's compare, single threaded parser state:

merge_code_lines      := false;
last_empty            := false;
collect_empty         := false;
has_metadata_block    := false;
last_indentation      := 0;
current_indentation   := 0;
source_line           := 0;
read_time             := 0;

source_file : string;
document    : *Document;
last_node   : *Node;
node_pool   : Pool(Node, 128);
text_pool   : Pool(Text, 512);

Multi-threaded parser state:

Parse_Context :: struct {
    merge_code_lines      := false;
    last_empty            := false;
    collect_empty         := false;
    has_metadata_block    := false;
    last_indentation      := 0;
    current_indentation   := 0;
    source_line           := 0;
    read_time             := 0;

    source_file : string;
    document    : *Document;
    last_node   : *Node;
    node_pool   : Pool(Node, 128);
    text_pool   : Pool(Text, 512);
}

#add_context parse_context: Parse_Context;
using context.parse_context;

I changed 3 lines instead of 200+ and everything worked like a charm! Needless to say, this version of the parser was sub-optimal and was further improved in the future, but this same trick is used to make it thread friendly in the current version.